Add bidirectional packet capture #6882

AryanBakliwal · 2024-12-23T04:18:38Z

fixes: #6862

Added bidirectional field in packet capture CR spec.
For testing, I created two pods and pinged one from the other.

Screenshot of the .pcapng output file.

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal · 2024-12-23T04:20:04Z

@hangyan please let me know what you think about this

pkg/agent/packetcapture/packetcapture_controller.go

pkg/apis/crd/v1alpha1/types.go

…antrea-io#6877) Bumps the golang-org-x group with 1 update: [golang.org/x/net](https://github.com/golang/net). Updates `golang.org/x/net` from 0.32.0 to 0.33.0 - [Commits](golang/net@v0.32.0...v0.33.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor dependency-group: golang-org-x ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…oup (antrea-io#6880) Bumps the ginkgo group with 1 update: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo). Updates `github.com/onsi/ginkgo/v2` from 2.22.0 to 2.22.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.22.0...v2.22.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: ginkgo ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal · 2025-01-01T13:13:56Z

Hey @hangyan @antoninbas, I made the changes in the bpf code according to the one generated by tcpdump.

When I try to test the bidirectional packet capture, it fails with this log

E1229 11:44:58.723258       1 packetcapture_controller.go:424] "PacketCapture failed capturing packets" err="invalid argument" name="pc-test"

Could you please take a look and help me identify what might be going wrong?

Also, golangci-lint is giving me this error even though I have changed the Capture method everywhere.

===> Running golangci (linux) <===
===> Running golangci (windows) <===
cmd/antrea-agent/agent.go:64:2: could not import antrea.io/antrea/pkg/agent/packetcapture (-: # antrea.io/antrea/pkg/agent/packetcapture
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error)) (typecheck)
        "antrea.io/antrea/pkg/agent/packetcapture"
        ^
pkg/agent/packetcapture/capture_interface.go:1: : # antrea.io/antrea/pkg/agent/packetcapture [antrea.io/antrea/pkg/agent/packetcapture.test]
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error) (typecheck)
// Copyright 2024 Antrea Authors.
make: *** [Makefile:329: golangci] Error 1

hangyan · 2025-01-02T02:29:30Z

Hey @hangyan @antoninbas, I made the changes in the bpf code according to the one generated by tcpdump.

When I try to test the bidirectional packet capture, it fails with this log

E1229 11:44:58.723258       1 packetcapture_controller.go:424] "PacketCapture failed capturing packets" err="invalid argument" name="pc-test"

Could you please take a look and help me identify what might be going wrong?

Also, golangci-lint is giving me this error even though I have changed the Capture method everywhere.

===> Running golangci (linux) <===
===> Running golangci (windows) <===
cmd/antrea-agent/agent.go:64:2: could not import antrea.io/antrea/pkg/agent/packetcapture (-: # antrea.io/antrea/pkg/agent/packetcapture
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error)) (typecheck)
        "antrea.io/antrea/pkg/agent/packetcapture"
        ^
pkg/agent/packetcapture/capture_interface.go:1: : # antrea.io/antrea/pkg/agent/packetcapture [antrea.io/antrea/pkg/agent/packetcapture.test]
pkg/agent/packetcapture/packetcapture_controller.go:170:23: cannot use capture (variable of type *capture.pcapCapture) as PacketCapturer value in assignment: *capture.pcapCapture does not implement PacketCapturer (wrong type for method Capture)
                have Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet) (chan gopacket.Packet, error)
                want Capture(context.Context, string, int, "net".IP, "net".IP, *"antrea.io/antrea/pkg/apis/crd/v1alpha1".Packet, bool) (chan gopacket.Packet, error) (typecheck)
// Copyright 2024 Antrea Authors.
make: *** [Makefile:329: golangci] Error 1

this error was reported by golangci on windows, we have a pcap_unsupported.go for windows/... interface, it's an non-op since we didn't support this on windows yet, you should change that too.

pkg/agent/packetcapture/capture/bpf.go

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal · 2025-01-05T18:52:56Z

I’ve added the direction field with enum values (SourceToDestination, DestinationToSource, and Both) and set the default to SourceToDestination. Additionally, I’ve updated the YAML manifests, packet capture guide and tests accordingly.

Please let me know if any adjustments are needed or if there's anything else I should update or add.

Captured packets

ICMP (SourceToDestination)

ICMP (DestinationToSource)

ICMP (Both)

TCP (SourceToDestination)

TCP (DestinationToSource)

TCP (Both)

pkg/agent/packetcapture/capture/bpf.go

pkg/agent/packetcapture/capture/bpf_test.go

pkg/apis/crd/v1alpha1/types.go

antoninbas

could we add an e2e test with the Both direction?

pkg/agent/packetcapture/capture/bpf.go

pkg/apis/crd/v1alpha1/types.go

build/yamls/antrea.yml

pkg/apis/crd/v1alpha1/types.go

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal · 2025-01-19T22:21:13Z

@antoninbas @hangyan I have refactored the compilePacketFilter function, added a test case for dstPort and Both in bpf_test.go, generated the manifests by running make manifest and added an e2e test with the Both direction.
Could you please review the changes? Let me know if there’s anything else I can improve.

antoninbas

thanks for your ongoing work on this

pkg/agent/packetcapture/capture/bpf.go

antoninbas · 2025-01-28T05:30:17Z

pkg/agent/packetcapture/capture/bpf.go

+// (004) ld       [26]									   # Load 4B at 26 (source address)
+// (005) jeq      #0xaf40102       jt 6	jf 15			   # If bytes match(10.244.0.2), goto #6, else #15
+// (006) ld       [30]									   # Load 4B at 30 (dest address)
+// (007) jeq      #0xaf40103       jt 8	jf 26			   # If bytes match(10.244.0.3), goto #8, else #26


would the code be easier to generate if we jumped to (015) in case this instruction (007) was not a match?

the fact that we know we can skip the second or operand in this case is an optimization. Not saying it's not a good optimization, but if using a "fixed" offset makes code generation much easier, maybe we should do it for now.

I'm thinking about the discussion we just had during the community meeting. That being said, I can see why generating the same code as tcpdump is valuable.

@AryanBakliwal Please take a look at this ^

Taking a concrete example:

tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3) or (src host 10.244.1.3 and dst host 10.244.1.2))"

This is the code generated by tcpdump:

(000) ldh [12] (001) jeq #0x800 jt 2 jf 12 (002) ldb [23] (003) jeq #0x6 jt 4 jf 12 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 8 (006) ld [30] (007) jeq #0xaf40103 jt 11 jf 12 (008) jeq #0xaf40103 jt 9 jf 12 (009) ld [30] (010) jeq #0xaf40102 jt 11 jf 12 (011) ret #262144 (012) ret #0

If we generate this code instead, I believe it will be simpler:

(000) ldh [12] (001) jeq #0x800 jt 2 jf 12 (002) ldb [23] (003) jeq #0x6 jt 4 jf 12 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 8 (006) ld [30] (007) jeq #0xaf40103 jt 11 jf 8 <- less efficient than tcpdump's version but easier to implement? (008) jeq #0xaf40103 jt 9 jf 12 (009) ld [30] (010) jeq #0xaf40102 jt 11 jf 12 (011) ret #262144 (012) ret #0

@antoninbas Actually, the real problem is the calculation of skipTrue (for which I created the function calculateSkipTrue) when we want to accept the packet. But calculateSkipTrue depends on many parameters [srcPort, dstPort, direction, which port is loaded in the register (srcPort or dstPort) and if we are calling this function from the first half (instructions that check request) or the second half (instructions that check response)] to calculate the number of instructions to skip to jump to the returnKeep instruction. This makes the code generation very complex if generating instruction set exactly like tcpdump.

Taking each case:

ICMP (both srcPort and dstPort are 0)

tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3) or (src host 10.244.1.3 and dst host 10.244.1.2))"

Code generated by tcpdump:

(000) ldh [12] (001) jeq #0x800 jt 2 jf 12 (002) ldb [23] (003) jeq #0x6 jt 4 jf 12 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 8 (006) ld [30] (007) jeq #0xaf40103 jt 11 jf 12 <- need to jump to 11 from 7, difference to calculate = 3 (008) jeq #0xaf40103 jt 9 jf 12 (009) ld [30] (010) jeq #0xaf40102 jt 11 jf 12 <- skipTrue = 0 (011) ret #262144 (012) ret #0

TCP (srcPort = 0 and dstPort > 0)

tcpdump -d "ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3 and dst port 80) or (src host 10.244.1.3 and dst host 10.244.1.2 and src port 80))"

Code generated by tcpdump:

(000) ldh [12] (001) jeq #0x800 jt 2 jf 22 (002) ldb [23] (003) jeq #0x6 jt 4 jf 22 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 13 (006) ld [30] (007) jeq #0xaf40103 jt 8 jf 22 <- jump to next instruction since port(s) not zero (008) ldh [20] (009) jset #0x1fff jt 22 jf 10 (010) ldxb 4*([14]&0xf) (011) ldh [x + 16] (012) jeq #0x50 jt 21 jf 22 <- need to jump to 21 from 12, difference to calculate = 8 (013) jeq #0xaf40103 jt 14 jf 22 (014) ld [30] (015) jeq #0xaf40102 jt 16 jf 22 (016) ldh [20] (017) jset #0x1fff jt 22 jf 18 (018) ldxb 4*([14]&0xf) (019) ldh [x + 14] (020) jeq #0x50 jt 21 jf 22 <- skipTrue = 0 (021) ret #262144 (022) ret #0

Note that this is similar to when (srcPort > 0 and dstPort = 0) and (srcPort > 0 and dstPort > 0).

Since the problem is to jump to the returnKeep instruction, why not add it again to mitigate this complex calculation and jump to it easily. For example:

ICMP:
Code that can be generated more simply

(000) ldh [12] (001) jeq #0x800 jt 2 jf 12 (002) ldb [23] (003) jeq #0x6 jt 4 jf 12 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 8 (006) ld [30] (007) jeq #0xaf40103 jt 8 jf 13 <- just skip to next instruction if true (008) ret #262144 (009) jeq #0xaf40103 jt 10 jf 13 (010) ld [30] (011) jeq #0xaf40102 jt 12 jf 13 (012) ret #262144 (013) ret #0

TCP (srcPort = 0 and dstPort > 0)
Code that can be generated more simply

(000) ldh [12] (001) jeq #0x800 jt 2 jf 23 (002) ldb [23] (003) jeq #0x6 jt 4 jf 23 (004) ld [26] (005) jeq #0xaf40102 jt 6 jf 13 (006) ld [30] (007) jeq #0xaf40103 jt 8 jf 23 (008) ldh [20] (009) jset #0x1fff jt 23 jf 10 (010) ldxb 4*([14]&0xf) (011) ldh [x + 16] (012) jeq #0x50 jt 13 jf 23 <- just skip to next instruction if true (013) ret #262144 (014) jeq #0xaf40103 jt 15 jf 23 (015) ld [30] (016) jeq #0xaf40102 jt 17 jf 23 (017) ldh [20] (018) jset #0x1fff jt 23 jf 19 (019) ldxb 4*([14]&0xf) (020) ldh [x + 14] (021) jeq #0x50 jt 22 jf 23 (022) ret #262144 (023) ret #0

Let me know what you think about this approach

Adding an extra return instruction sounds fine to me.

I think what we are going for is some code that looks like this in the end:

if direction == SourceToDest { generateFilters(offset, srcIP, dstIP, srcPort, dstPort) } else if direction == DestToSource { generateFilters(offset, dstIP, srcIP, dstPort, srcPort) } else { generateFilters(offset, srcIP, dstIP, srcPort, dstPort) // ... generateFilters(offset, dstIP, srcIP, dstPort, srcPort) }

So if we can get there and have some generate code that looks reasonable and works, that would be good.

@antoninbas I have made the requested changes and refactored the code as you suggested and as per our discussion. PTAL and let me know if there is anything else to update or add.

Signed-off-by: Aryan Bakliwal <[email protected]>

antoninbas · 2025-02-03T21:51:09Z

pkg/agent/packetcapture/capture/bpf.go

+// (012) jeq      #0x7b            jt 13	jf 27		   # TCP Source Port: If 123, goto #13, else #26
+// (013) ldh      [x + 16]								   # Load 2B at x+16 (TCP dst port)
+// (014) jeq      #0x7c            jt 15	jf 27		   # TCP dst port: If 123, goto #15, else #26
+// (015) ret      #262144								   # MATCH


Maybe a stupid question, but if this is the only change we have, it doesn't seem worth it? Why can't we calculate the jump size using size - curOffset - 1 given that:

we always compute the full size up front

we always know the current offset ("where we are")

the instruction before last is always a MATCH return instruction

What am I missing?

@antoninbas You're right, the jump size can indeed be calculated easily with the available information (full size and current offset).
The main challenge, however, isn't the calculation itself but rather when to perform it, meaning in which instructions do we need skipTrue set to size - curOffset - 1.
And if a helper function is used for calculation, then it depends on which instruction is calling the helper function, what is the direction and what is the value of ports. Adding this extra instruction helps simplify the instruction set generation and avoids unnecessary complexity.

The challenges I faced:

For request packets in Both direction, there are 3 cases (assuming the 3 instructions - dstIP. srcPort and srcPort comparison instructions are calling a helper function to get jump size/skipTrue value):

When both ports are 0, the destination IP comparison instruction performs the calculation to jump to MATCH.

When only one port is 0, the destination IP comparison sets skipTrue to 0, and the comparison for the defined port performs the calculation to jump to MATCH.

When neither port is 0, all comparisons set skipTrue to 0, except for the destination port comparison, which performs the calculation to jump to MATCH.

In SourceToDestination, DestinationToSource and response packets in Both, no calculations are needed, and these 3 instructions (dstIP, srcPort and dstPort comparison) jump to MATCH step with skipTrue set to 0.

The helper function that calculates the jump size or returns the value of skipTrue requires additional context which is a bunch of parameters such as the calling instruction, direction, ports, size and current offset. Adding this logic in the code makes it difficult to extend in the future if we filter traffic on more parameters and also increases complexity.

Hope that this explanation makes sense and adding an extra MATCH instruction seems reasonable. Any suggestions on how we could simplify or approach this differently? I'm open to any ideas you might have for improving the implementation of calculation of skipTrue value.

hangyan · 2025-02-05T07:10:52Z

pkg/agent/packetcapture/capture/bpf.go

+// 'ip proto 6 and ((src host 10.244.1.2 and dst host 10.244.1.3 and src port 123 and dst port 124) or (src host 10.244.1.3 and dst host 10.244.1.2 and src port 124 and dst port 123))'
+// And using `tcpdump -i <device> '<filter>' -d` will generate the following BPF instructions:
+// (000) ldh      [12]									   # Load 2B at 12 (Ethertype)
+// (001) jeq      #0x800           jt 2	jf 26			   # Ethertype: If IPv4, goto #2, else #26


in comment section, else #26 should be else $27？

hangyan · 2025-02-05T08:05:59Z

pkg/agent/packetcapture/capture/bpf.go

+	return count
+}
+
+func compareIPPort(srcAddrVal, dstAddrVal uint32, size, curLen uint8, srcPort, dstPort uint16, useSkipFalse bool) []bpf.Instruction {


this function name is not very straightforward, can you use a new one? we may also need some comments.

Also i think the param useSkipFalse maybe misleading, we always use skip false, the difference is skipFalse to ret #0 or skipFalse to some intermediate instructions.

AryanBakliwal added 2 commits December 23, 2024 00:26

add bidirection flag

875dde4

Signed-off-by: Aryan Bakliwal <[email protected]>

add bidirectional capture

1307c17

Signed-off-by: Aryan Bakliwal <[email protected]>

antoninbas reviewed Dec 23, 2024

View reviewed changes

pkg/agent/packetcapture/packetcapture_controller.go Outdated Show resolved Hide resolved

pkg/apis/crd/v1alpha1/types.go Outdated Show resolved Hide resolved

dependabot bot and others added 3 commits January 1, 2025 18:39

update code and tests

79a26c0

Signed-off-by: Aryan Bakliwal <[email protected]>

hangyan reviewed Jan 2, 2025

View reviewed changes

pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved

AryanBakliwal added 2 commits January 5, 2025 19:42

Merge branch 'main' into bidirection

6e5898c

add enum, tests and docs

e5ef842

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal requested review from antoninbas, hangyan and tnqn January 5, 2025 18:53

hangyan reviewed Jan 8, 2025

View reviewed changes

pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved

pkg/agent/packetcapture/capture/bpf.go Outdated Show resolved Hide resolved

pkg/agent/packetcapture/capture/bpf_test.go Show resolved Hide resolved

pkg/apis/crd/v1alpha1/types.go Show resolved Hide resolved

antoninbas reviewed Jan 8, 2025

View reviewed changes

pkg/agent/packetcapture/capture/bpf.go Show resolved Hide resolved

pkg/apis/crd/v1alpha1/types.go Outdated Show resolved Hide resolved

build/yamls/antrea.yml Show resolved Hide resolved

pkg/apis/crd/v1alpha1/types.go Show resolved Hide resolved

refactor and add e2e test

7eb34de

Signed-off-by: Aryan Bakliwal <[email protected]>

AryanBakliwal force-pushed the bidirection branch from 7b17816 to 7eb34de Compare January 19, 2025 22:07

Merge branch 'main' into bidirection

5d04e5d

AryanBakliwal requested review from hangyan and antoninbas January 19, 2025 22:21

antoninbas reviewed Jan 21, 2025

View reviewed changes

antoninbas reviewed Jan 28, 2025

View reviewed changes

AryanBakliwal added 2 commits February 1, 2025 09:52

refactor

e1a1826

Signed-off-by: Aryan Bakliwal <[email protected]>

Merge branch 'main' into bidirection

077484e

AryanBakliwal requested a review from antoninbas February 1, 2025 06:29

antoninbas reviewed Feb 3, 2025

View reviewed changes

hangyan reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bidirectional packet capture #6882

Add bidirectional packet capture #6882

AryanBakliwal commented Dec 23, 2024 •

edited

Loading

AryanBakliwal commented Dec 23, 2024

AryanBakliwal commented Jan 1, 2025

hangyan commented Jan 2, 2025

AryanBakliwal commented Jan 5, 2025

antoninbas left a comment

AryanBakliwal commented Jan 19, 2025

antoninbas left a comment

antoninbas Jan 28, 2025

antoninbas Jan 28, 2025

AryanBakliwal Jan 30, 2025

antoninbas Jan 31, 2025

AryanBakliwal Feb 1, 2025

antoninbas Feb 3, 2025

AryanBakliwal Feb 4, 2025

hangyan Feb 5, 2025

hangyan Feb 5, 2025

Add bidirectional packet capture #6882

Are you sure you want to change the base?

Add bidirectional packet capture #6882

Conversation

AryanBakliwal commented Dec 23, 2024 • edited Loading

AryanBakliwal commented Dec 23, 2024

AryanBakliwal commented Jan 1, 2025

hangyan commented Jan 2, 2025

AryanBakliwal commented Jan 5, 2025

Captured packets

antoninbas left a comment

Choose a reason for hiding this comment

AryanBakliwal commented Jan 19, 2025

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

The challenges I faced:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AryanBakliwal commented Dec 23, 2024 •

edited

Loading